276 research outputs found

    Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

    Get PDF
    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental dataComment: 44 pages, 9 figure

    The Cyprinodon variegatus genome reveals gene expression changes underlying differences in skull morphology among closely related species

    Get PDF
    Genes in durophage intersection set at 15 dpf. This is a comma separated table of the genes in the 15 dpf durophage intersection set. Given are edgeR results for each pairwise comparison. Columns indicating whether a gene is included in the intersection set at a threshold of 1.5 or 2 fold are provided. (CSV 13 kb

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    GLADX: An Automated Approach to Analyze the Lineage-Specific Loss and Pseudogenization of Genes

    Get PDF
    A well-established ancestral gene can usually be found, in one or multiple copies, in different descendant species. Sometimes during the course of evolution, all the representatives of a well-established ancestral gene disappear in specific lineages; such gene losses may occur in the genome by deletion of a DNA fragment or by pseudogenization. The loss of an entire gene family in a given lineage may reflect an important phenomenon, and could be due either to adaptation, or to a relaxation of selection that leads to neutral evolution. Therefore, the lineage-specific gene loss analyses are important to improve the understanding of the evolutionary history of genes and genomes. In order to perform this kind of study from the increasing number of complete genome sequences available, we developed a unique new software module called GLADX in the DAGOBAH framework, based on a comparative genomic approach. The software is able to automatically detect, for all the species of a phylum, the presence/absence of a representative of a well-established ancestral gene, and by systematic steps of re-annotation, confirm losses, detect and analyze pseudogenes and find novel genes. The approach is based on the use of highly reliable gene phylogenies, of protein predictions and on the analysis of genomic mutations. All the evidence associated to evolutionary approach provides accurate information for building an overall view of the evolution of a given gene in a selected phylum. The reliability of GLADX has been successfully tested on a benchmark analysis of 14 reported cases. It is the first tool that is able to fully automatically study the lineage-specific losses and pseudogenizations. GLADX is available at http://ioda.univ-provence.fr/IodaSite/gladx/

    Structure and evolution of the gorilla and orangutan growth hormone loci

    Get PDF
    In primates, the unigenic growth hormone (GH) locus of prosimians, expressed primarily in the anterior pituitary, evolved by gene duplications, independently in New World Monkeys (NWM) and Old World Monkeys (OWMs)/apes, to give complex clusters of genes expressed in the pituitary and placenta. In human and chimpanzee, the GH locus comprises five genes, GH-N being expressed as pituitary GH, whereas GH-V (placental GH) and CSHs (chorionic somatomammotropins) are expressed (in human and probably chimpanzee) in the placenta; the CSHs comprise CSH-A, CSH-B and the aberrant CSH-L (possibly a pseudogene) in human, and CSH-A1, CSH-A2 and CSH-B in chimpanzee. Here the GH locus in two additional great apes, gorilla (Gorilla gorilla gorilla) and orangutan (Pongo abelii), is shown to contain six and four GH-like genes respectively. The gorilla locus possesses six potentially expressed genes, gGH-N, gGH-V and four gCSHs, whereas the orangutan locus has just three functional genes, oGH-N, oGH-V and oCSH-B, plus a pseudogene, oCSH-L. Analysis of regulatory sequences, including promoter, enhancer and P-elements, shows significant variation; in particular the proximal Pit-1 element of GH-V genes differs markedly from that of other genes in the cluster. Phylogenetic analysis shows that the initial gene duplication led to distinct GH-like and CSH-like genes, and that a second duplication provided separate GH-N and GH-V. However, evolution of the CSH-like genes remains unclear. Rapid adaptive evolution gave rise to the distinct CSHs, after the first duplication, and to GH-V after the second duplication. Analysis of transcriptomic databases derived from gorilla tissues establishes that the gGH-N, gGH-V and several gCSH genes are expressed, but the significance of the many CSH genes in gorilla remains unclear

    A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions

    Get PDF
    Background  Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species.  Results  A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recentvasagene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus.  Conclusions  This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species

    Transcriptome Analysis of Female and Male Xiphophorus maculatus Jp 163 A

    Get PDF
    Background: Xiphophorus models are important for melanoma, sex determination and differentiation, ovoviviparity and evolution. To gain a global view of the molecular mechanism(s) whereby gene expression may influence sexual dimorphism in Xiphophorus and to develop a database for future studies, we performed a large-scale transcriptome study. Methodology/Principal Findings: The 454-FLX massively parallel DNA sequencing platform was employed to obtain 742,771 and 721,543 reads from 2 normalized cDNA libraries generated from whole adult female and male X. maculatus Jp 163 A, respectively. The reads assembled into 45,538 contigs (here, a "contig" is a set of contiguous sequences), of which, 11,918 shared homology to existing protein sequences. These numbers estimate that the contigs may cover 53% of the total number of Xiphophorus transcriptome. Putative translations were obtained for 11,918 cDNA contigs, of which, 3,049 amino acid sequences contain Pfam domains and 11,064 contigs encode secretory proteins. A total of 3,898 contigs were associated with 2,781 InterPro (IPR) entries and 5,411 contigs with 132 KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways. There were 10,446 contigs annotated with 69,778 gene ontology (GO) terms and the three corresponding organizing principles. Fifty-four potential sex differentially expressed genes have been identified from these contigs. Eight and nine of these contigs were confirmed by real-time PCR as female and male predominantly expressed genes respectively. Based on annotation results, 34 contigs were predicted to be differentially expressed in male and female and 17 of them were also confirmed by real-time PCR. Conclusions/Significance: This is the first report of an annotated overview of the transcriptome of X. maculatus and identification of sex differentially expressed genes. These data will be of interest to researchers using the Xiphophorus model. This work also provides an archive for future studies in molecular mechanisms of sexual dimorphism and evolution, and can be used in comparative studies of other fish
    corecore